Mini Project Visualisation


Sanrio is a major Japanese entertainment company, with a market cap of £6.42 billion. Founded in 1960 under Shintaro Tsuji, the company has created over 450 characters including the global icon Hello Kitty. Beyond character design, the company’s influence extends through the production of theme parks, merchandising, animated films, video games, and collaborations with high-profile brands such as Starbucks, Adidas, and Swarovski. To engage its’ international fanbase, Sanrio launched the Sanrio Character Ranking event in 1986, originally published in a Japanese newspaper and now conducted online. This attracts widespread participation ( 65 million votes in 2025 ) and generates rich data that reveals trends of fan preferences across countries and regions.

My aim for this visualisation is to:

Sourcing data

The data was sourced online through ( Load in the repository for github that has since gone, explain how and why its gone etc and upload the excel file that I grabbed from git anyways ).

# Load in libraries
library(ggalluvial)
library(tidyverse)
library(ggplot2)
library(plotly)
library(readxl)
library(dplyr)

Data Cleaning

The data contains 3 sheets - total ranking by year, oversea ranking and regional ranking. I am interested in the regional rankings, as it contains the year, country, character and rank. I filtered out countries aside from South Korea, China, Taiwan, USA and UK as I felt that these were accurate representations of Sanrio’s fanbase.

#Loading the data in
sanriocharacterranking <- read_excel("~/my6422worksheets/sanriocharacterranking.xlsx", 
                                     sheet = 3)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Seperating only the countries I want to look into

countries <- c('United Kingdom', 'United States', 'Taiwan', 'China', 'South Korea')
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#Cleaning the data, changing the name of the 'country/region' variable and removing ranks higher than 3.

sanrio_clean <- sanriocharacterranking %>% 
  select(year, character, country = `country/region`, ranking) %>% 
  filter( ranking < 4,
         country %in% countries)
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#Just checking it's all in order
glimpse(sanrio_clean)
## Rows: 111
## Columns: 4
## $ year      <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, …
## $ character <chr> "Gudetama", "Yoshikitty", "Kuromi", "Gudetama", "Pompompurin…
## $ country   <chr> "South Korea", "South Korea", "South Korea", "Taiwan", "Taiw…
## $ ranking   <dbl> 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, …

Initial (bad) visualisation

With this cleaned data, I first wanted to create an alluvial plot with seperate facets for each country. The idea here was to show my aims clearly, but it quickly became evident that this approach was not working - there were too many characters and colours and it was hard to follow what was happening ( as seen below ). On top of this, the data was presenting an issue. For most characters, there wasn’t much movement from year to year, and for most characters they were only in the rankings once and never re-appeared. This results in the plot being both overcrowded within the legend, and dull with movement - an example of a bad visualisation.

#Plotting an alluvial with the previous data. 
ggplot(
  data = sanrio_clean,
  aes(x = year, stratum = ranking, alluvium = interaction(character, country),
    y = 1, fill = character)) +
  geom_stratum(colour = NA) +
  geom_alluvium(alpha = 0.4) +
  facet_wrap(~ country) +
  labs(
    title = "Sanrio Character Ranking Changes Over Time by Country",
    x = "Year",
    y = ""
  ) +
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.line.y = element_blank()
  )

Data wrangling

After reflecting upon my data further, I realised that the clearest way to show my data would involve transforming it. Including all aspects of the data - the year, the specific ranking placement, the character name and the country, would not create a good visualisation. Instead, I had the idea of transforming the data into how many times each character had been within the ’ top 3’ rankings, separated by country. This would give each character more information to work with and show across years.

#Removing data from before 2020, and collecting information on how often each character was within the top 3 by year. Creating a new column for this infomration. 

sanrio_top3 <- sanrio_clean %>%
  filter(year > 2019) %>% 
  group_by(year, character, country) %>%
  summarise(
    total_top3 = n(),
    ranking = paste(sort(ranking), collapse = ", "),
    .groups = "drop"
  )
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# Quick view of the wrangled data.

glimpse(sanrio_top3)
## Rows: 90
## Columns: 5
## $ year       <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,…
## $ character  <chr> "Aggretsuko", "Cinnamoroll", "Cinnamoroll", "Cinnamoroll", …
## $ country    <chr> "United States", "China", "Taiwan", "United Kingdom", "Unit…
## $ total_top3 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ ranking    <chr> "3", "1", "2", "3", "2", "3", "1", "1", "1", "2", "3", "3",…

Final visualisation

Here, I created an interactive, animated bar chart using plotly. This helped my goals of:

  • Showcasing preference trends over time ( viewing character popularity year by year )

  • Showcasing preference by region ( each chart is split by country )

#Plotting an interactive, animated bar chart with plotly. With the new variable of total_top3 on the X axis, character name on the Y axis, coloured by country and framed by year. Adding ranking into the hover info for further detail.

plot_ly(sanrio_top3, 
        x = ~total_top3, 
        y = ~character, 
        color = ~country, 
        frame = ~year, 
        type = 'bar', 
        orientation = 'h', 
        text = ~paste("Character:", character,
                      "<br>Country:", country,
              "<br>Ranking:", ranking), 
        hoverinfo = 'text') %>%
  layout(title = "Top Characters per Year by Country",
         xaxis = list(title = "Number of times each character is in the top 3"),
         yaxis = list(title = ""),
         barmode = 'stack') %>%
  animation_opts(frame = 1000, transition = 500, redraw = TRUE) %>%
  animation_slider(currentvalue = list(prefix = "Year: "))